Search CORE

18 research outputs found

Support Vector Machines for Speech Recognition

Author: Ganapathiraju Aravind
Publication venue: Scholars Junction
Publication date: 01/01/2002
Field of study

Hidden Markov models (HMM) with Gaussian mixture observation densities are the dominant approach in speech recognition. These systems typically use a representational model for acoustic modeling which can often be prone to overfitting and does not translate to improved discrimination. We propose a new paradigm centered on principles of structural risk minimization using a discriminative framework for speech recognition based on support vector machines (SVMs). SVMs have the ability to simultaneously optimize the representational and discriminative ability of the acoustic classifiers. We have developed the first SVM-based large vocabulary speech recognition system that improves performance over traditional HMM-based systems. This hybrid system achieves a state-of-the-art word error rate of 10.6% on a continuous alphadigit task ? a 10% improvement relative to an HMM system. On SWITCHBOARD, a large vocabulary task, the system improves performance over a traditional HMM system from 41.6% word error rate to 40.6%. This dissertation discusses several practical issues that arise when SVMs are incorporated into the hybrid system

CiteSeerX

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

Implementing contextual biasing in GPU decoder for online ASR

Author: Ganapathiraju Aravind
Madikeri Srikanth
Motliček Petr
Nigmatulina Iuliia
Pandia Karthik
Villatoro-Tello Esaú
Zuluaga-Gomez Juan
Publication venue
Publication date: 23/06/2023
Field of study

GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual information can considerably improve ASR predictions. Previous studies have proven the viability of lattice rescoring in decoding and biasing language model (LM) weights in offline and online CPU scenarios. In real-time GPU decoding, partial recognition hypotheses are produced without lattice generation, which makes the implementation of biasing more complex. The paper proposes and describes an approach to integrate contextual biasing in real-time GPU decoding while exploiting the standard Kaldi GPU decoder. Besides the biasing of partial ASR predictions, our approach also permits dynamic context switching allowing a flexible rescoring per each speech segment directly on GPU. The code is publicly released and tested with open-sourced test sets.Comment: Accepted to Interspeech 202

arXiv.org e-Print Archive

Risk Minimization Approaches in Signal Processing

Author: Aravind Ganapathiraju
Jonathan Hamaker
Joseph Picone
Publication venue
Publication date
Field of study

by, Statistical techniques based on Hidden Markov models (HMMs) with Gaussian emission densities have dominated the signal processing and pattern recognition literature for the past 20 years. However, HMMs suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. SVMs have been shown to provide significant improvements in performance on small pattern recognition tasks compared to a number of conventional approaches. SVMs, however, require ad hoc (and unreliable) methods to couple it to probabilistic learning machines. Probabilistic Bayesian learning machines, such as the relevance vector machine (RVM), are fairly new approaches that attempt to overcome the deficiencies of SVMs by explicitly accounting for sparsity and statistics in their formulation. In the proposed paper, we will review the past 30 years of research into these new learning machines, and describe how they can be used to solve many traditional signal processing problems. Unifying themes in this work are the concepts of risk minimization and margin maximization, which can be viewed as a generalization of the maximum likelihood principle so fundamental to many signal processing approaches. It is our belief that this information has not been previously explained in a way that makes it accessible to mainstream signal processing researchers, so we believe this paper will have significant tutorial value

CiteSeerX